Integrating machine learning into control systems for industrial facilities

ABSTRACT

Methods, systems, apparatus and computer program products for implementing machine learning within control systems are disclosed. An industrial facility setting slate can be received from a machine learning system and a determination can be made as to whether to adopt the settings in the industrial facility setting slate. The machine learning model can be a neural network, e.g., a deep neural network, that has been trained, e.g., using reinforcement learning to predict a data setting slate that is predicted to optimize an efficiency of a data center.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of and claims priority to PatentCooperation Treaty Application No. PCT/US2018/029611, for INTEGRATINGMACHINE LEARNING INTO CONTROL SYSTEMS, filed on Apr. 26, 2018, whichclaims the benefit of the filing date of U.S. Provisional ApplicationNo. 62/490,544 for INTEGRATING MACHINE LEARNING INTO CONTROL SYSTEMS,filed on Apr. 26, 2017. The disclosure of the foregoing applications areincorporated here by reference.

BACKGROUND

This specification relates to integrating machine learning into controlsystems.

A machine learning model receives input and generates output based onits received input and on values of model parameters.

Neural networks are machine learning models that employ one or morelayers of nonlinear units to predict an output for a received input.Some neural networks include one or more hidden layers in addition to anoutput layer. The output of each hidden layer is used as input to thenext layer in the network, i.e., the next hidden layer or the outputlayer. Each layer of the network generates an output from a receivedinput in accordance with current values of a respective set ofparameters.

Neural networks can be trained using reinforcement learning to generatepredicted outputs. Generally, in a reinforcement learning trainingtechnique, a reward is received and is used to adjust the values of theparameters of the neural network.

For example, a neural network trained using reinforcement learning canpropose settings for an industrial facility, such as data center, whichis a facility that holds computer servers for remote storage,processing, or distribution of large amounts of data.

SUMMARY

This specification describes technologies for machine learning systemsin general, and specifically to systems and methods for directlycontrolling physical infrastructure of industrial facilities using amachine learning system.

In general, one innovative aspect of the subject matter described inthis specification can be embodied in a method that uses machinelearning to control physical infrastructure of industrial facilities.

Other embodiments of this aspect include corresponding computer systems,apparatus, and computer programs recorded on one or more computerstorage devices, each configured to perform the actions of the methods.For a system of one or more computers to be configured to performparticular operations means that the system has installed on itsoftware, firmware, hardware, or a combination of them that in operationcause the system to perform the operations or actions. For one or morecomputer programs to be configured to perform particular operations oraction means that the one or more programs include instructions that,when executed by data processing apparatus, cause the apparatus toperform the operations or actions.

The foregoing and other embodiments can each optionally include one ormore of the following features, alone or in combination. In particular,one embodiment includes all the following features in combination.

An example method that includes receiving, from a machine learningsystem, an industrial facility setting slate that the machine learningsystem predicts will optimize an efficiency of an industrial facility,wherein the industrial facility settings slate defines a respectivesetting for each of a plurality of industrial facility controls;determining whether the industrial facility settings defined by theindustrial facility setting slate can safely be adopted by theindustrial facility (e.g., whether the industrial facility will, if thesettings predicted by a prediction model are adopted, operate inaccordance with pre-determined criteria that determine a safeenvironment for the industrial facility); and in response to determiningthat the industrial facility settings defined by the industrial facilitysetting slate can safely be adopted, adopting the industrial facilitysettings defined by the industrial facility setting slate.

Here the term “optimize” is used to mean improving the efficiency of theindustrial facility with respect to an efficiency criterion.“Optimization of efficiency” does not necessarily imply that thesettings defined by the industrial facility setting slate are thesettings which provide the absolute maximum of efficiency with respectto all possible values of the settings, but rather the term may meanthat the efficiency according to the efficiency criterion is greater forthe industrial facility setting slate than for at least one otherpossible industrial facility setting slate. In particular, the term“optimization of efficiency” may mean that the industrial facilitysetting slate provides an efficiency according to the efficiencycriterion which is no less than respective efficiencies which have beenderived for a plurality of other possible industrial facility settingslates.

The term “industrial facility” may be defined as a physical entity(“physical infrastructure”) comprising one of more physical units (e.g.,machinery, computer equipment or other equipment) arranged to act on(e.g., to generate, modify or rearrange) any one or more of: (i) data,(ii) at least one communication signal, (iii) at least one power signaland/or (iv) a plurality of physical elements. The industrial facilitymay be for producing data/signals/physical elements for a large number(e.g., at least 100, and typically many thousands) of individuals, whotypically do not have ownership of the industrial facility. Theindustrial facility controls may comprise control parameters whichmodify the physical operation of the physical units and/or controlparameters for modifying the operation of additional equipment (e.g.,cooling equipment) which is used to maintain the operating state of thephysical units. The physical units may be located in one geographicallocation (e.g., within one building) but may alternatively begeographically distributed. The “industrial facility” may for example bea data center, and the physical unit(s) may comprise server(s) forprocessing data received by the data center to generate, from thereceived data, modified data which is transmitted from the data center.Alternatively, the “industrial facility” may be a manufacturing ordistribution center, and the physical unit(s) may comprise apparatuswhich acts on physical elements to modify them, and/or to assemble them,and/or to distribute them to locations outside the industrial facility.Alternatively, the “industrial facility” may be a station for generatinga tele-communication signal (e.g., which is broadcast or multi-cast), ora power generation facility for generating a power signal, or alaboratory where physical elements are examined and/or modified toproduce data.

A second embodiment may be a controller that performs the respectiveoperations of the example method.

The second embodiment may be expressed as a system comprising:

one or more computers; and

one or more storage devices storing instructions that are operable, whenexecuted on one or more computers, to cause the one or more computers toperform the operations of:

receiving, from a machine learning system, an industrial facilitysetting slate that the machine learning system predicts will optimize anefficiency of an industrial facility, wherein the industrial facilitysettings slate defines a respective setting for each of a plurality ofindustrial facility controls;

determining whether the industrial facility settings defined by theindustrial facility setting slate can safely be adopted by theindustrial facility; and

in response to determining that the industrial facility settings definedby the industrial facility setting slate can safely be adopted, adoptingthe industrial facility settings defined by the industrial facilitysetting slate.

The second embodiment may also be expressed as a computer programproduct (e.g., one or more non-transitory computer-readable storagemediums) comprising instructions (e.g., stored on the mediums) that areexecutable by a processing device and upon such execution cause theprocessing device to perform operations of:

receiving, from a machine learning system, an industrial facilitysetting slate that the machine learning system predicts will optimize anefficiency of an industrial facility, wherein the industrial facilitysettings slate defines a respective setting for each of a plurality ofindustrial facility controls;

determining whether the industrial facility settings defined by theindustrial facility setting slate can safely be adopted by theindustrial facility; and

in response to determining that the industrial facility settings definedby the industrial facility setting slate can safely be adopted, adoptingthe industrial facility settings defined by the industrial facilitysetting slate.

The second embodiment may also be expressed as a device for controllingphysical infrastructure in an industrial facility, the devicecomprising:

a controller that performs the respective operations of:

-   -   receiving, from a machine learning system, an industrial        facility setting slate that the machine learning system predicts        will optimize an efficiency of an industrial facility, wherein        the industrial facility settings slate defines a respective        setting for each of a plurality of industrial facility controls;    -   determining whether the industrial facility settings defined by        the industrial facility setting slate can safely be adopted by        the industrial facility; and    -   in response to determining that the industrial facility settings        defined by the industrial facility setting slate can safely be        adopted, adopting the industrial facility settings defined by        the industrial facility setting slate.

A third embodiment may be a system comprising: a machine learning systemthat receives state data characterizing a state of an industrialfacility and predicts an industrial facility setting slate, which willoptimize an efficiency of the industrial facility, wherein theindustrial facility setting slate defines a respective setting for eachof a plurality of industrial facility controls; a controller thatdetermines whether the industrial facility settings defined by theindustrial facility setting slate can be adopted by the industrialfacility; and in response to determining that the industrial facilitysettings defined by the industrial facility setting slate can safely beadopted, adopts the industrial facility settings defined by theindustrial facility setting slate; and a proxy that facilitates (e.g.,makes possible) a communication path between the machine learning systemand the controller.

The proxy may comprise physical components (e.g., a physical interfaceto a communications network) and/or a communication protocol. In someimplementations, the proxy is part of the system. In otherimplementations, the proxy communicates with the system.

In response to determining that the industrial facility settings cannotsafely be adopted by the industrial facility, settings provided by adefault control system may be adopted for the industrial facility.

The default control system may be a rule-based control system.Determining whether the industrial facility settings defined by theindustrial facility setting slate can safely be adopted may includedetermining whether each of the industrial facility settings defined bythe industrial facility setting slate falls within an acceptable rangeor rate of change of the setting.

Determining whether the industrial facility settings defined by theindustrial facility setting slate can safely be adopted may includedetermining whether predictions received from the machine learningsystem have become unstable or wrong.

Determining whether predictions received from the machine learningsystem have become unstable or wrong may include determining, for eachof the industrial facility controls, whether a rate of change ofrecently predicted settings for the control has satisfied a threshold.The term “recently predicted” may refer to settings which were generatedless than a certain (e.g., pre-defined) period before the determinationis made.

Determining whether predictions received from the machine learningsystem have become unstable or wrong may include determining, for eachof the industrial facility controls, whether a variance of recentlypredicted settings for the industrial facility control has satisfied athreshold.

Determining whether predictions received from the machine learningsystem have become unstable or wrong may include determining that thepredictions are simply incorrect, leading to inefficient or erroneousoperation of the industrial facility.

Prior to adopting the industrial facility settings, state datacharacterizing the current state of the industrial facility may bereceived. A determination may be made to determine, using the statedata, whether the industrial facility settings defined by the industrialfacility setting slate can safely be adopted. The determination mayinclude determining whether the current state of the industrial facilityis suitable for adopting the industrial facility settings.

Determining whether the current state of the industrial facility issuitable for adopting the industrial facility settings may includedetermining whether any sensor readings identified in the state datafall outside of an (e.g., pre-determined) acceptable range for thesensor.

A determination may be made that no communications have been receivedfrom the machine learning system for more than a threshold amount oftime and in response, the industrial facility may be controlled usingthe default control system for the industrial facility.

State data characterizing an updated state of the industrial facilitymay be sent to the machine learning system after the industrial facilitysettings have been adopted for use in generating a new predicted datasetting slate.

The machine learning system may include a machine learning model that isa neural network. The machine learning model may be a deep neuralnetwork. The neural network may be trained using reinforcement learningbased on measured or calculated efficiency of the industrial facility.

The subject matter described in this specification can be implemented inparticular embodiments so as to realize one or more of the followingadvantages.

A machine learning system may be able to automatically select processcontrol set points for optimizing a desired objective function within anindustrial facility. For example, in a data center, setting values maybe selected to optimize power or other resource (e.g., water in thesystem) usage efficiency, machine health, and central processing unitutilization. For a power plant, setting values may be selected tooptimize total power output and heat rate. For a manufacturing facility,setting values may be selected to optimize throughput, yield, andproduct quality.

Although the present specification gives examples in the context of datacenters, the described techniques are equally applicable for any type ofindustrial facility, e.g., data centers, power plants, and manufacturingfacilities. Thus, the described techniques can be used to improve theoperation of industrial facilities generally.

Using a machine learning model that predicts safe, advantageoussettings, the system may choose the settings without requiring userinput or extensive testing. The system may continually optimizeefficiency over time as the operating state or configuration of theindustrial facility and its operating conditions change.

The details of one or more embodiments of the subject matter of thisspecification are set forth in the accompanying drawings and thedescription below. Other features, aspects, and advantages of thesubject matter will become apparent from the description, the drawings,and the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows an example efficiency management system

FIG. 2 is shows an example of a control system with machine learningintegration.

FIG. 3 is a flowchart of an example process for controlling industrialfacility settings.

Like reference numbers and designations in the various drawings indicatelike elements.

DETAILED DESCRIPTION

The specification generally describes a control system, integrated witha machine learning system, that provides direct control over industrialfacility infrastructure to improve industrial facility efficiency. Forexample, a machine learning system may choose setting values forresources in the industrial facility that optimize any one or more ofpower usage efficiency, machine health, central processing unitutilization, and thermal margin, among other things. The setting valuesmay optimize the efficiency of all of the facility or of only ofdesignated portion of the facility, e.g., of a subset of the machineryin the facility.

In an industrial facility, many possible combinations of hardware, e.g.,mechanical and electrical equipment, and software, e.g., controlstrategies and set points, contribute to industrial facility efficiency.For example, one of the primary sources of energy use in the industrialfacility environment is cooling. Industrial facilities generate heatthat must be removed to keep the servers running. Cooling is typicallyaccomplished by large industrial equipment such as pumps, chillers, andcooling towers.

However, a simple change to a cold aisle temperature set point willproduce load variations in the cooling infrastructures of the industrialfacility, e.g., chillers, cooling towers, heat exchangers, and pumps.These load variations cause nonlinear changes in equipment efficiency.The number of possible operating configurations and various feedbackloops among industrial facility equipment, equipment operation, and theindustrial facility environment make it difficult to optimize energyefficiency. Testing each and every feature combination to maximizeindustrial facility efficiency is unfeasible given time constraints,frequent fluctuations in industrial facility sensor information andweather conditions, and the need to maintain a stable industrialfacility environment. Traditional engineering formulas for predictivemodeling often produce large errors because they fail to capture thecomplex interdependencies of systems in the industrial facility.

A machine learning system receives state data characterizing the currentstate of an industrial facility and provides updated industrial facilitysettings to a control system that manages the settings of the industrialfacility. The machine learning system can be, for example, a machinelearning system such as the one described in U.S. patent applicationSer. No. 15/410,547 entitled OPTIMIZING DATA CENTER CONTROLS

USING NEURAL NETWORKS, which was filed Jan. 19, 2017, the entirecontents of which are hereby incorporated by reference herein.

FIG. 1 shows an example efficiency management system 100. The efficiencymanagement system 100 is an example of a system implemented as computerprograms on one or more computers in one or more locations, in which thesystems, components, and techniques described below can be implemented.

The efficiency management system 100 receives state data 140characterizing the current state of a data center (or other industrialfacility) 104 and provides updated settings 120 to a control system 102that manages the settings of the data center 104.

The efficiency management system 100 can take in, as input, state data140 representing the current state of the data center (or otherindustrial facility) 104. This state data can come from sensor readingsof sensors in the data center 104 and operating scenarios within thedata center 104. The state data may include data such as any one or moreof temperatures, power, pump speeds, and set points.

The efficiency management system 100 uses this data to determine datacenter settings 120 that should be changed in the data center 104 inorder to make the data center 104 more efficient.

Once the efficiency management system 100 determines the data centersettings 120 that will make the data center 104 more efficient, theefficiency management system 100 provides the updated data centersettings 120 to the control system 102. The control system 102 uses theupdated data center settings 120 to set one or more data center values(control values) for controlling the data center. For example, if theefficiency management system 100 determines that an additional coolingtower should be turned on in the data center 104, the efficiencymanagement system 100 can either provide the updated data centersettings 120 to a user who updates the settings or to the control system102, which automatically adopts the settings without user interaction.The control system 102 can send the signal to the data center toincrease the number of cooling towers that are powered on andfunctioning in the data center 104.

The efficiency management system 100 can train an ensemble of machinelearning models 132A-132N using a model training subsystem 160 topredict the resource efficiency of the data center 104 if particulardata center settings are adopted. In some cases, the efficiencymanagement system 100 can train a single machine learning model topredict the resource efficiency of the data center if particular datacenter settings are adopted.

In particular, each machine learning model 132A-132N in the ensemble isconfigured through training to receive a state input characterizing thecurrent state of the data center 104 and a data center setting slatethat defines a combination of possible data center settings and toprocess the state input and the data center setting slate to generate anefficiency score that characterizes a predicted resource efficiency ofthe data center if the data center settings defined by the data centersetting slate are adopted.

In some implementations, the efficiency score represents a predictedpower usage effectiveness (PUE) of the data center if the settings of aparticular slate are adopted by the data center 104. PUE is defined asthe ratio of the total building energy usage to the informationtechnology energy usage.

In some implementations, the efficiency score represents a predictedwater usage of the data center if the settings of a particular slate areadopted by the data center 104. In other implementations, the efficiencyscore represents a predicted monetary amount spent on electricity. Inother implementations, the efficiency score represents a predicted loadamount that can be achieved by a datacenter.

In some implementations, each machine learning model (132A-132N) is aneural network, e.g., a deep neural network, that the efficiencymanagement system 100 can train to produce an efficiency score.

Neural networks are machine learning models that employ one or morelayers of models to generate an output, e.g., one or moreclassifications, for a received input. Deep neural networks include oneor more hidden layers in addition to an output layer. The output of eachhidden layer is used as input to the next layer in the network, i.e.,the next hidden layer or the output layer. Each layer of the neuralnetwork generates an output from a received input in accordance withcurrent values of a respective set of parameters for the layer.

The model training subsystem 160 uses historical data from a data center104 to create different datasets of sensor data from the data center.Each machine learning model 132A-132N in the ensemble of machinelearning models can be trained on one dataset of historical sensor data.

The efficiency management system 100 can train additional ensembles ofconstraint machine learning models 112A-112N using the model trainingsubsystem 160 to predict an operating property of the data center thatcorresponds to an operating constraint if the data center 104 adoptscertain data center settings 102.

If the efficiency management system 100 determines that a constraintmodel predicts that the value of a given data center setting willviolate a constraint of the data center, the efficiency managementsystem will discard the violating setting.

Each constraint model 112A-112N is a machine learning model, e.g., adeep neural network, that is trained to predict certain values of anoperating property of the data center over a period of time if the datacenter adopts a given input setting. For example, the model trainingsubsystem 160 can train one constraint model to predict the future watertemperature of the data center over the next hour given input state data140 and potential settings. The model training subsystem 120 can trainanother constraint model to predict the water pressure over the nexthour given the state data 140 and potential settings.

A setting slate management subsystem 110 within the efficiencymanagement system 100 preprocesses the state data 140 and constructs aset of setting slates that represent one or more (typically a pluralityof) data center setting values that can be set for various parts of thedata center given the known operating conditions and the current stateof the data center 104. Each setting slate defines a respectivecombination of possible data center settings that affect the efficiencyof the data center 104.

For example, the efficiency management system 100 may determine the mostresource efficient settings for a cooling system of the data center 104.The cooling system may have the following architecture: (1) servers heatup the air on the server floor; (2) the air is cycled and the heat istransferred to the process water system; (3) the process water system iscycled and connects to the condenser water system using a heat sync; and(4) the condenser water system takes the heat from the process watersystem and transfers it to the outside air using cooling towers or largefans.

To efficiently control the cooling system, the efficiency managementsystem 100 may construct different potential setting slates that includevarious temperatures for the cooling tower set points, cooling towerbypass valve positions, cooling unit condenser water pump speeds, anumber of cooling units running, and/or process water differentialpressure set points. As an example, one setting slate may include thefollowing values: 68 degrees as the temperature for the cooling towerset points, 27 degrees as the cooling tower bypass valve position, 500rpm as the cooing unit condenser water pump speed, and 10 as the numberof cooling units running.

Other examples of slate settings that impact efficiency of the datacenter 104 include: potential power usage across various parts of thedata center; certain temperature settings across the data center; agiven water pressure; specific fan or pump speeds; and a number and typeof the running data center equipment such as cooling towers and waterpumps.

During preprocessing, the setting slate management subsystem 110 canmodify the state data 140. For example, it may remove data withininvalid power usage efficiency, replace missing data for a given datasetting with a mean value for that data setting, and/or remove apercentage of data settings. The setting slate management system 110discretizes all of the action dimensions and generates an exhaustive setof possible action combinations. For any continuous action dimensions,the system converts the action into a discrete set of possible values.For example, if one of the action dimensions is a valve that has a valuefrom 0.0 to 1.0, the system may discretize the values into the set [0.0,0.05, 0.1, 0.15, . . . , 1.0]. The system may discretize for everydimension, and the full action set is every possible combination of thevalues. The system then removes all actions that violate the constraintmodels.

The setting slate management subsystem 110 sends the constructed set ofsetting slates and the current state of the data center 104 to theconstraint model 112A-112N. The setting slate management subsystem thendetermines whether certain data center setting slates, if chosen by thesystem, are predicted to result in violations of operating constraintsfor the data center. The setting slate management subsystem 100 removesany data center setting slates from the set of setting slates that arepredicted to violate the constraints of the data center.

The efficiency management system 100 sends the updated set of settingslates and the state data 140 to the ensemble of machine learning models132A-132N that use the state data and the setting slates to generateefficiency scores as output.

Since each machine learning model in the ensemble of models 132A-132N istrained on a different dataset than the other models, each model has thepotential to provide a different predicted PUE output when all themachine learning models in the ensemble are run with the same datacenter setting values as input. Additionally or alternatively eachmachine learning model may have a different architecture which couldalso cause each model to potentially provide a different predicted PUEoutput.

The efficiency management system 100 can choose data center settingvalues that focus on long-term efficiency of the data center. Forexample, some data center setting values provide long-term power usageefficiency for the data center, e.g., ensuring that the power usage inthe data center is efficient for a long predetermined time after thedata center was in the state characterized by the state input. Long-termpower usage efficiency may be for time durations of at least tenminutes, such as thirty minutes, one hour, or longer from the time thedata center was in the input state, whereas short term power usageefficiency focuses on a short time (e.g., less than ten minutes) afterthe data center was in the input state, e.g., immediately after fiveseconds after, the data center was in the input state.

The system can optimize the machine learning models for long termefficiency so that the models can make predictions based on the dynamicsof the data center and are less likely to provide recommendations forslate settings that yield good results in the short term, but are badfor efficiency over the long term. For example, the system can predictPUE over the next day, assuming that optimal actions will continue to betaken every hour. The system can then take actions that it knows willlead to the best PUE over the whole data, even if the PUE for a givenhour is worse than the previous hour.

The efficiency management system 100 determines the final efficiencyscore for a given setting slate based on the efficiency scores of eachmachine learning model in the ensemble of models for a given settingslate to produce one overall efficiency score per setting slate.

The efficiency management system 100 then either recommends or selectsnew values for the data center settings based on the efficiency scoresassigned to each slate from the machine learning models 132A-132N. Theefficiency management system can send the recommendations to a datacenter operator to be implemented, e.g., by being presented to the datacenter operator on a user computer, or set automatically without needingto be sent to a data center operator.

In some implementations, the machine learning system may be acloud-based artificial intelligence system. There may be a proxy betweenthe machine learning system and the control system that allows thecontrol system to communicate with the cloud-based AI, e.g., over atelecommunication system. The proxy sends the recommended industrialfacility settings from the machine learning system to the controlsystem. The proxy can use a communication protocol, such as Modbus, tofacilitate communication.

In certain situations, using the predictions generated by the machinelearning system of FIG. 1 may result in instability or hazardousconditions in the data center. In these situations, a control systemshould be used to determine safe settings that can be adopted by thedata center or industrial facility without causing the data center orindustrial facility environment to become dangerous or unstable.

FIG. 2 shows an example of such a control system 202. The control system202 is an example of a system implemented as computer programs on one ormore computers in one or more locations, in which the systems,components, and techniques described below can be implemented.

The control system 202 receives state data 240 describing the industrialfacility 204. The state data 240 can come from sensor readings ofsensors in the industrial facility 204 and operating scenarios withinthe industrial facility 204. The state data may include data such astemperatures, power, pump speeds, and set points. As illustrated in FIG.2, the control system is within the industrial facility 204. However, insome implementations the control system can be separate from theindustrial facility 204 and communicates with the industrial facility bya proxy or some other communication mechanism.

The control system 202 may send state data 240 to the machine learningsystem 200. The system 202 receives proposed industrial facilitysettings 220 and, optionally, a heartbeat signal 260 from a machinelearning system 200. The industrial facility settings 220 are settingsthat the machine learning system 200 determines will make the industrialfacility 204 more efficient with respect to one or more metrics that themachine learning system has been trained to optimize. The settings maybe in the form of a setting slate that defines a setting value for eachof the industrial facility control settings.

For example, the settings may be for a cooling system of the industrialfacility 204, as similarly described above. The cooling system may havethe following architecture: (1) servers heat up the air on the serverfloor; (2) the air is cycled and the heat is transferred to the processwater system; (3) the process water system is cycled and connects to thecondenser water system using a heat sync; and (4) the water condenserwater system takes the heat from the process water system and transfersit to the outside air using cooling towers or large fans. Settings caninclude various temperatures for cooling tower set points, cooling towerbypass valve positions, cooling unit condenser water pump speeds, anumber of cooling units running, and/or process water differentialpressure set points.

Efficiency may be measured in terms of one of several cost functionsincluding: optimize power or other resource (e.g., water in the system)usage efficiency, machine health, central processing unit utilization,and thermal margin. The heartbeat signal 260 verifies communicationbetween the control system 202 and the real-time data network of themachine learning system 200.

The control system 202 can use the updated industrial facility settings220 from the machine learning system 200 to set the industrial facility204 values. For example, if the machine learning system 200 determinesthat an additional cooling tower should be turned on in the industrialfacility 204, the machine learning system 200 can provide the updatedindustrial facility settings 220 to the control system 202. Asillustrated in FIG. 2, the control system 202 communicates directly withthe machine learning system 200. However, in some implementations thecontrol system communicates with the machine learning system through aproxy or some other communication mechanism.

The controller 222 of the control system 202 determines whether thesettings are safe to automatically adopt without user interaction 202.If the settings are safe to adopt, the control system 202 sets controlsof the industrial facility 204 to settings 225 that are the same as theindustrial facility settings 220 which the control system 202 receivedfrom the machine learning system 200. For example, the controls may beelectronically configurable and communicatively coupled to the controlsystem 202, e.g., through a wired or wireless connection. The controlsystem 202 can then send a signal that causes the controls to be set tosettings 225 that are the same as the industrial facility settings 220.

In this example, the control system 202 can send the signal to theindustrial facility 204 to increase the number of cooling towers thatare powered on and functioning in the industrial facility 204 if thecontrol system 202 determines that the settings are safe to adopt.

However, if the controller 222 determines that the settings receivedfrom the machine learning system 200 are unsafe to adopt, the controlsystem 202 may hold the industrial facility at the last known goodindustrial facility settings and begin adopting settings provided by adefault control system 232 for the industrial facility 204. Thecontroller then sends the default control system 232 settings to theindustrial facility 204 as the industrial facility settings 225.

FIG. 3 illustrates an example flow diagram of the determination by thecontroller 222 regarding whether to use industrial facility settingsfrom the machine learning system 200 or to use default control andsettings from a default control system. In some cases, the defaultcontrol system uses rules and heuristics to set industrial facilityvalues, i.e., the manner in which the default control system selectssettings is hard-coded and does not use machine learning.

The controller 222 uses one or more criteria to determine whether to useindustrial facility settings from the machine learning system or to usedefault control and settings. The controller can check to see if amachine learning mode that distinguishes between using machine learningsystem industrial facility settings and default settings from a defaultcontrol system has been disabled. The machine learning mode can bedisabled by an industrial facility operator manually or by the machinelearning system through a mode setting. If the machine learning mode hasbeen disabled, the controller will go into default mode and use thedefault control system to set the industrial facility settings. Thecontroller will also enter default mode and use default control systemsettings when there are equipment failures in the industrial facility.If communication is lost between the controller and other controllers orequipment, the controller may also revert to default controls.

When the controller 222 is in machine learning mode, the controller maydetermine whether the industrial facility settings proposed by themachine learning system are safe to implement in the industrialfacility.

As illustrated in FIG. 3, an example controller 222 receives industrialfacility settings 220 from the machine learning system and state datathat includes sensor data 345 from the industrial facility. Thecontroller 222 can also optionally receive the heartbeat signal 260 fromthe machine learning system.

The controller 222 determines whether the industrial facility can safelyadopt the industrial facility settings proposed by the machine learningsystem 200. The controller may perform this determination by comparingeach setting in the received industrial facility settings with anacceptable value or range of values that has been predefined for thesetting. The controller determines whether each setting is anappropriate value or falls within the acceptable range for the setting.If the settings are within the acceptable value or range of values, thecontroller determines that the industrial facility can safely adopt thesettings proposed by the machine learning system. For example, theindustrial facility settings proposed by the machine learning system mayinclude a setting indicating that an additional cooling tower should beturned on in the industrial facility. The acceptable number of coolingtowers that are on at a given time in the industrial facility may be 10.If 10 are currently on, the setting indicating that an additionalcooling tower should be turned on will cause 11 cooling towers to beturned on at a given time in the industrial facility. Since 11 isgreater than the defined appropriate value of 10, the controllerdetermines that the industrial facility cannot safely adopt theindustrial facility settings.

Additionally or alternatively, in some implementations the industrialfacility determines that the industrial facility settings proposed bythe machine learning system are safe to adopt by determining whether thepredictions received from the machine learning model have becomeunstable. For example, stability may be determined by a rate of changeor variance of a setting value. If an industrial facility settingchanges and/or varies by at least a predetermined threshold amount, thesetting value is considered unstable. The system computes a rate ofchange or a variance between the most-recently predicted values for thesetting by the machine learning system. If the rate of change orvariance exceeds a defined threshold, the system determines that thepredictions from the machine learning system for the setting value havebecome unstable.

When industrial facility settings are determined to be unstable, thecontroller 222 may hold the industrial facility at the last known goodvalues and transition to the default control system 232 for newindustrial facility settings.

The controller 222 can additionally or alternatively receive state data240 that includes data characterizing the current state of theindustrial facility and determine whether the current state of theindustrial facility is suitable for adopting the industrial facilitysettings proposed by the machine learning system. In some cases, theproposed industrial facility settings would be unsafe to implement giventhe state of the industrial facility. This suitability determination maybe made by determining whether any sensor readings identified in thestate data fall outside an acceptable value or an acceptable range forthe sensor. For example, one of the sensor readings may be the currenttemperature at some point within the facility. If the currenttemperature reading exceeds a threshold, the proposed industrialfacility is considered to be in an unsafe state for direct control bythe machine learning system. Therefore, prior to adopting the industrialfacility settings, the controller will hold the industrial facility atthe last known good values and transition to the default control systemfor new industrial facility settings.

The controller 222 can use the optional heartbeat signal 260 todetermine whether there is communication between the controller and themachine learning system. If it is determined that no communication hasbeen received by the controller from the machine learning system formore than a predefined threshold amount of time 304, the controller willcontrol the industrial facility using the default control system 232 forthe industrial facility 314 and may disable the machine learning controlmode 312.

Embodiments of the subject matter and the functional operationsdescribed in this specification can be implemented in digital electroniccircuitry, in tangibly-embodied computer software or firmware, incomputer hardware, including the structures disclosed in thisspecification and their structural equivalents, or in combinations ofone or more of them. Embodiments of the subject matter described in thisspecification can be implemented as one or more computer programs, i.e.,one or more modules of computer program instructions encoded on atangible non-transitory storage medium for execution by, or to controlthe operation of, data processing apparatus. The computer storage mediumcan be a machine-readable storage device, a machine-readable storagesubstrate, a random or serial access memory device, or a combination ofone or more of them. Alternatively or in addition, the programinstructions can be encoded on an artificially-generated propagatedsignal, e.g., a machine-generated electrical, optical, orelectromagnetic signal, that is generated to encode information fortransmission to suitable receiver apparatus for execution by a dataprocessing apparatus.

The term “data processing apparatus” refers to data processing hardwareand encompasses all kinds of apparatus, devices, and machines forprocessing data, including by way of example a programmable processor, acomputer, or multiple processors or computers. The apparatus can alsobe, or further include, special purpose logic circuitry, e.g., an FPGA(field programmable gate array) or an ASIC (application-specificintegrated circuit). The apparatus can optionally include, in additionto hardware, code that creates an execution environment for computerprograms, e.g., code that constitutes processor firmware, a protocolstack, a database management system, an operating system, or acombination of one or more of them.

A computer program, which may also be referred to or described as aprogram, software, a software application, an app, a module, a softwaremodule, a script, or code, can be written in any form of programminglanguage, including compiled or interpreted languages, or declarative orprocedural languages; and it can be deployed in any form, including as astand-alone program or as a module, component, subroutine, or other unitsuitable for use in a computing environment. A program may, but neednot, correspond to a file in a file system. A program can be stored in aportion of a file that holds other programs or data, e.g., one or morescripts stored in a markup language document, in a single file dedicatedto the program in question, or in multiple coordinated files, e.g.,files that store one or more modules, sub-programs, or portions of code.A computer program can be deployed to be executed on one computer or onmultiple computers that are located at one site or distributed acrossmultiple sites and interconnected by a data communication network.

The processes and logic flows described in this specification can beperformed by one or more programmable computers executing one or morecomputer programs to perform functions by operating on input data andgenerating output. The processes and logic flows can also be performedby special purpose logic circuitry, e.g., an FPGA or an ASIC, or by acombination of special purpose logic circuitry and one or moreprogrammed computers.

Computers suitable for the execution of a computer program can be basedon general or special purpose microprocessors or both, or any other kindof central processing unit. Generally, a central processing unit willreceive instructions and data from a read-only memory or a random accessmemory or both. The essential elements of a computer are a centralprocessing unit for performing or executing instructions and one or morememory devices for storing instructions and data. The central processingunit and the memory can be supplemented by, or incorporated in, specialpurpose logic circuitry. Generally, a computer will also include, or beoperatively coupled to receive data from or transfer data to, or both,one or more mass storage devices for storing data, e.g., magnetic,magneto-optical disks, or optical disks. However, a computer need nothave such devices. Moreover, a computer can be embedded in anotherdevice, e.g., a mobile telephone, a personal digital assistant (PDA), amobile audio or video player, a game console, a Global PositioningSystem (GPS) receiver, or a portable storage device, e.g., a universalserial bus (USB) flash drive, to name just a few.

Computer-readable media suitable for storing computer programinstructions and data include all forms of non-volatile memory, mediaand memory devices, including by way of example semiconductor memorydevices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks,e.g., internal hard disks or removable disks; magneto-optical disks; andCD-ROM and DVD-ROM disks.

To provide for interaction with a user, embodiments of the subjectmatter described in this specification can be implemented on a computerhaving a display device, e.g., a CRT (cathode ray tube) or LCD (liquidcrystal display) monitor, for displaying information to the user and akeyboard and a pointing device, e.g., a mouse or a trackball, by whichthe user can provide input to the computer. Other kinds of devices canbe used to provide for interaction with a user as well; for example,feedback provided to the user can be any form of sensory feedback, e.g.,visual feedback, auditory feedback, or tactile feedback; and input fromthe user can be received in any form, including acoustic, speech, ortactile input. In addition, a computer can interact with a user bysending documents to and receiving documents from a device that is usedby the user; for example, by sending web pages to a web browser on auser's device in response to requests received from the web browser.Also, a computer can interact with a user by sending text messages orother forms of message to a personal device, e.g., a smartphone, runninga messaging application, and receiving responsive messages from the userin return.

Embodiments of the subject matter described in this specification can beimplemented in a computing system that includes a back-end component,e.g., as a data server, or that includes a middleware component, e.g.,an application server, or that includes a front-end component, e.g., aclient computer having a graphical user interface, a web browser, or anapp through which a user can interact with an implementation of thesubject matter described in this specification, or any combination ofone or more such back-end, middleware, or front-end components. Thecomponents of the system can be interconnected by any form or medium ofdigital data communication, e.g., a communication network. Examples ofcommunication networks include a local area network (LAN) and a widearea network (WAN), e.g., the Internet.

The computing system can include clients and servers. A client andserver are generally remote from each other and typically interactthrough a communication network. The relationship of client and serverarises by virtue of computer programs running on the respectivecomputers and having a client-server relationship to each other. In someembodiments, a server transmits data, e.g., an HTML page, to a userdevice, e.g., for purposes of displaying data to and receiving userinput from a user interacting with the device, which acts as a client.Data generated at the user device, e.g., a result of the userinteraction, can be received at the server from the device.

While this specification contains many specific implementation details,these should not be construed as limitations on the scope of anyinvention or on the scope of what may be claimed, but rather asdescriptions of features that may be specific to particular embodimentsof particular inventions. Certain features that are described in thisspecification in the context of separate embodiments can also beimplemented in combination in a single embodiment. Conversely, variousfeatures that are described in the context of a single embodiment canalso be implemented in multiple embodiments separately or in anysuitable subcombination. Moreover, although features may be describedabove as acting in certain combinations and even initially be claimed assuch, one or more features from a claimed combination can in some casesbe excised from the combination, and the claimed combination may bedirected to a subcombination or variation of a subcombination.

Similarly, while operations are depicted in the drawings in a particularorder, this should not be understood as requiring that such operationsbe performed in the particular order shown or in sequential order, orthat all illustrated operations be performed, to achieve desirableresults. In certain circumstances, multitasking and parallel processingmay be advantageous. Moreover, the separation of various system modulesand components in the embodiments described above should not beunderstood as requiring such separation in all embodiments, and itshould be understood that the described program components and systemscan generally be integrated together in a single software product orpackaged into multiple software products.

Particular embodiments of the subject matter have been described. Otherembodiments are within the scope of the following claims. For example,the actions recited in the claims can be performed in a different orderand still achieve desirable results. As one example, the processesdepicted in the accompanying figures do not necessarily require theparticular order shown, or sequential order, to achieve desirableresults. In some cases, multitasking and parallel processing may beadvantageous.

What is claimed is:

1. A method comprising: receiving, from a machine learning system, anindustrial facility setting slate that the machine learning systempredicts will optimize an efficiency of an industrial facility, whereinthe industrial facility settings slate defines a respective setting foreach of a plurality of industrial facility controls; determining whetherindustrial facility settings defined by the industrial facility settingslate can safely be adopted by the industrial facility; and in responseto determining that the industrial facility settings defined by theindustrial facility setting slate can safely be adopted, adopting theindustrial facility settings defined by the industrial facility settingslate.
 2. The method of claim 1, further comprising: in response todetermining that the industrial facility settings cannot safely beadopted by the industrial facility, adopting settings provided by adefault control system for the industrial facility.
 3. The method ofclaim 2, wherein the default control system is a rule-based controlsystem.
 4. The method of claim 1, wherein determining whether theindustrial facility settings defined by the industrial facility settingslate can safely be adopted comprises: determining whether each of theindustrial facility settings defined by the industrial facility settingslate falls within an acceptable range for the industrial facilitysetting.
 5. The method of claim 1, wherein determining whether theindustrial facility settings defined by the industrial facility settingslate can safely be adopted further comprises: determining whetherpredictions received from the machine learning system have becomeunstable.
 6. The method of claim 5, wherein determining whetherpredictions received from the machine learning system have becomeunstable comprises: determining, for each of the industrial facilitycontrols, whether a rate of change of recently predicted settings forthe industrial facility control has satisfied a threshold.
 7. The methodof claim 5, wherein determining whether predictions received from themachine learning system have become unstable comprises: determining, foreach of the industrial facility controls, whether a variance of recentlypredicted settings for the industrial facility control has satisfied athreshold.
 8. The method of claim 1, further comprising: prior toadopting the industrial facility settings, receiving state datacharacterizing a current state of the industrial facility; and whereindetermining whether the industrial facility settings defined by theindustrial facility setting slate can safely be adopted comprises:determining whether the current state of the industrial facility issuitable for adopting the industrial facility settings.
 9. The method ofclaim 8, wherein determining whether the current state of the industrialfacility is suitable for adopting the industrial facility settingscomprises determining whether any sensor readings by a sensor identifiedin the state data fall outside of an acceptable range for the sensor.10. The method of claim 1, further comprising: determining that nocommunications have been received from the machine learning system formore than a threshold amount of time; and in response, controlling theindustrial facility using a default control system for the industrialfacility.
 11. The method of claim 1, further comprising: sending statedata characterizing an updated state of the industrial facility to themachine learning system after the industrial facility settings have beenadopted for use in generating a new predicted data setting slate. 12.The method of claim 1, wherein the machine learning system includes amachine learning model that is a neural network.
 13. The method of claim12, wherein the machine learning model is a deep neural network.
 14. Themethod of claim 12, wherein the neural network has been trained usingreinforcement learning based on measured or calculated efficiency of theindustrial facility.
 15. A system comprising: one or more computers; andone or more storage devices storing instructions that are operable, whenexecuted on one or more computers, to cause the one or more computers toperform operations comprising: receiving, from a machine learningsystem, an industrial facility setting slate that the machine learningsystem predicts will optimize an efficiency of an industrial facility,wherein the industrial facility settings slate defines a respectivesetting for each of a plurality of industrial facility controls;determining whether the industrial facility settings defined by theindustrial facility setting slate can safely be adopted by theindustrial facility; and in response to determining that the industrialfacility settings defined by the industrial facility setting slate cansafely be adopted, adopting the industrial facility settings defined bythe industrial facility setting slate.
 16. The system of claim 15, theoperations further comprising: in response to determining that theindustrial facility settings cannot safely be adopted by the industrialfacility, adopting settings provided by a default control system for theindustrial facility.
 17. The system of claim 15, wherein determiningwhether the industrial facility settings defined by the industrialfacility setting slate can safely be adopted further comprises:determining whether predictions received from the machine learningsystem have become unstable.
 18. A computer program product comprisinginstructions that are executable by a processing device and upon suchexecution cause the processing device to perform operations of:receiving, from a machine learning system, an industrial facilitysetting slate that the machine learning system predicts will optimize anefficiency of an industrial facility, wherein the industrial facilitysettings slate defines a respective setting for each of a plurality ofindustrial facility controls; determining whether the industrialfacility settings defined by the industrial facility setting slate cansafely be adopted by the industrial facility; and in response todetermining that the industrial facility settings defined by theindustrial facility setting slate can safely be adopted, adopting theindustrial facility settings defined by the industrial facility settingslate.
 19. The computer program product of claim 18, the operationsfurther comprising: in response to determining that the industrialfacility settings cannot safely be adopted by the industrial facility,adopting settings provided by a default control system for theindustrial facility.
 20. A device for controlling physicalinfrastructure in an industrial facility, the device comprising: acontroller that performs operations comprising: receiving, from amachine learning system, an industrial facility setting slate that themachine learning system predicts will optimize an efficiency of anindustrial facility, wherein the industrial facility settings slatedefines a respective setting for each of a plurality of industrialfacility controls; determining whether the industrial facility settingsdefined by the industrial facility setting slate can safely be adoptedby the industrial facility; and in response to determining that theindustrial facility settings defined by the industrial facility settingslate can safely be adopted, adopting the industrial facility settingsdefined by the industrial facility setting slate.